31.1 Ontology
385
Nevertheless, the sheer volume of data (sequences and structures) emerging from
experimental molecular biology is a powerful driver for treating it ontologically in
order to allow human beings, and machines, to make some sense of it. Without an
ontology the mass of data would be unstructured and, hence, overwhelming to the
human mind, for it would be very difficult to discern meaningful paths through it.
In bioinformatics, ontology typically has a more restricted definition, namely “a
working model of entities and interactions”. 4 These models would include a glossary
of terms as a basic part. Other components of a model are generally considered to be
the following (note that there has been little attempt by ontologists to define these
words carefully and unambiguously): classes or categories (sets of objects); attributes
or concepts, which may be either primitive (necessary conditions for membership of
a class) or defined (necessary and sufficient conditions for membership); arbitrary
rules (sometimes called axioms) constraining class membership, which might be
considered to be part of the glossary of terms; relations (between classes or con-
cepts), which might be either taxonomic (hierarchical) or associative; instantiations
(concrete examples; i.e., individual objects); and events that change attributes, or
relations, or both.
An ontology, which belongs to the category of semantics, is necessarily subordi-
nate to the rules, in the category of inference, for its construction much as a system
of classification (Sect. 31.2) depends on rules. The ontology is then superordinate
to mark-up, in a category of syntax. For example, a familiar mark-up technology is
XML (“extensible mark-up language”).
Mark-up is in turn superordinate to encoding in a form suitable for the computer. 5
Mark-up is essential for realizing the Semantic Web, an extension of the World
Wide Web that enables machines to “understand” the meaning of data on the web. 6
The Semantic Web comprises data stored in a standard format and linked with rela-
tionships that might allow machines to interpret the data, enabling them to identify
and extract relationships between different pieces of data and use these to draw new
4 Each different model—such as RiboWeb, EcoCyc—is typically called an “ontology”; hence, we
have the Gene Ontology, the Transparent Access to Multiple Bioinformatics Information Sources
(TAMBIS) Ontology (Baker et al. 1999), and so forth. If ontology is given the restricted meaning
of the study of classes of objects, then “an ontology” like TAMBIS can be considered to be the
product of ontological inquiry.
5 It is worth noting that many of these matters have long ago been tackled by chemists; databases such
as Beilstein and Chemical Abstracts have existed for more than a century, and encoding complex
molecular structures (albeit much simpler than a protein) as a string of characters has been achieved
using SMILES (simplified molecular input line system). See the Handbook of Chemoinformatics:
from Data to Knowledge (ed. J. Gasteiger) in four volumes (Wiley-VCH, Weinheim, 2003), for a
comprehensive overview.
6 Machines can understand data in the sense that they can interpret and analyse it, using algorithms
and statistical methods to uncover patterns and relationships. They can process large datasets and
identify correlations between different variables, and draw conclusions from the data; these conclu-
sions may seem surprising and revelatory because of the impossibility for a human being to hold
such large quantities of data in the mind.